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. CLASSIFICATION ACCURACY OF DIFFERENT 

" ■'■■■■■— 11 M ll> ' 1 

OPTIONS OF THE I MAGE? TOO SYSTEM 
% R. Kumar and M. Niero* 

ABSTRACT 

The purpose of this study was to compare the 
classification accuracy of land use classes of Sao Jose dos Campos, SP, 
Brazil, using the different options of signature acquisition for 
classification available in the Image-100, a system developed by 
General Electric Co., pixel -by-pixel maximum likelihood gaussian 
classifier (MlGC), and the sample classifier. In addition, the 
statistical separability of land use classes in the subsets of one to 
four spectral channels was investigated. With the help of ground 
observations and aerial photography, the mul tispectral scanner (MSS)data 
of LANDSAT were analysed using the Image-100. For the single-cell option 
of .the Image-100, the errors of omission varied from 16.3% for the class 
"commercial " to 26.8% for the class "residential". The errors of 
commission varied from 5.6% for the class "commercial" to 33.2% for the 
class "unoccupied". As expected, the multi-cell option increased the 
errors of omission and decreased the errors of commission. However, 
considering both the errors of omission and commission, this option 
considerably decreased the percentage of correct classification as 
compared to the single-cell option. On the whole, the sample classifier 
gave slightly more accurate results than MLGC and much more accurate 
than any of the options of classification available in Image-100. 


* The authors are with the Institute da Pecquisas Espaciais (INFE), Con 
sciho Nac tonal de Desenvolvimento Cientifico e Tecnologico (CNPq), 
12200 - Sao Jose dos Cantpos, SP t Brasil. 
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■INTRObUCTIOS 

The purpose of this study was to compare the 
classification accuracy of land use classes using the different options 
of signature acquisition for classification available in the Image-100 
(Image-100 is a data processing system marketed by General Electric 
Co. that extracts thematic information from mul tispectral imagery, 
enhances the image, etc.). In. addition, the statistical separability 
of land use classes in the subsets of one to four spectral channels was 
investigated. 


Cloud free multi spectral scanner data from LANDSAT, of 
reasonable quality over Sao Jose dos Campos (23°. 10' S, 45° 50' W), Sao 
Paulo, Brasil, acquired on September 8, 1972, were available. In 
addition, aerial photography and ground observations were available to 
assist*the analysis of the data. Sao Jose dos Campos was selected 
because it is one of the fastest growing small-size towns of Brasil and 
the authors are well familiar with it. Many of the problems of this 
town are similar to the problems of much larger urban centers. 

With the help of ground observations and aerial 
photograph> , a map of Sao Jose dos Campos, showing the following land 
use classes was obtained: residential areas, commercial areas, 

agricultural areas and unoccupied areas. 

The specific objectives.of the study are stated as 

follows: 

1. To determine what combinations of one through three spectral 
channels out of four available channels give the greatest 
overall statistical separability of the above four land use 
classes. 

2. To compare the classification accuracy of land use classes- 
using 'single-cell signature acquisition' and 'multi-cell 
signature acquisition' options of classification available in 
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the Image-100, pixel -by-pixel maximum likelihood gaussian 
classifier (MLGC) and a sample classifier, on-line-mode, In 
the Image-100. 

LITERATURE REVIEW 

Many investigators have analysed the multispectral 
scanner (MSS) data of LANDSAT satellite for applications to land use 
classification. For example, Todd and Baumgardner 1 (1973) analysed 
LANDSAT MSS data obtained over Marion County (Indianapolis), Indiana, 
by computer- implemented techniques to evaluate the utility of satellite 
data for urban land use classification. Several land use classes, such 
as commerce /industry, single-family (newer) residential, trees, and 
water exhibited spectrally separable characteristics and .were 
identified with greater than 90 percent accuracy. Ellefsen et al. 2 
(1973)- did computer-aided analysis of LANDSAT MSS data of the San 
Francisco Bay. area. Smith et al. 3 (1974) have given the application of 
spatial features to satellite land-use analysis. Ellefson et al. 1 * 

(1974) have given new techniques in mapping urban land use and 
monitoring change for selected U.S. metropolitan areas. They analysed 
LANDSAT MSS data using automatic pattern recognition techniques for 
classification. Kumar and Silva 5 (1977) have analysed the statistical 
separability of agricultural cover types in much detail, data quantity 
and depth in the subsets of one to tv/elve spectral channels. 

Goldberg et al. 5 (1975). have described methods and 
procedures which outside investigators may use with the automated 
processing equipment of the Canada Centre for Remote Sensing (CCRS) for 
the purpose of natural resource exploration and mapping. They have 
compared the accuracies of unsupervised and supervised methods on the 
basis of the confusion matrices generated by classifying exactly the 
same area. 

METHOD OF ANALYSIS 


Multispectral scanner data of computer compatible tapes 
of LANDSAT were analysed using Image-100. With the aid of land use map 


of Sao Jose dos Campos, mentioned above, rectangular areas of each of 
the above four land use classes were selected, avoiding the boundaries 
of classes on the display of the lmage-100. The areas of each of these 
classes were selected carefully so that they could be considered to be 
representative of the respective land use classes. Assuming that each of 
these classes has a multivariate gaussian distribution, the B-distance 
based on Bhattacharyya coefficient was calculated between all possible 
pairs of these classes in all possible combinations of one, two, three 
and four spectral channels using the feature selection algorithm of the 
Brazilian Institute of Space Research ( INPE) , on-line-mode, with the 
Image-100 7,a . For each value of B-distance, the probability of correct 
classification was reasonably estimated fr-jm the curve of Swain and King 
(1973) 9 , The B-distance for two multivariate gaussian distribution is 
given by 9 : 


B - 2 (1 - e" a ). 


0 ) 


where 


a • i (U r U 2 ) T POJ.-Ua) + 1 log, 

8 2 

where Uj and U ? are mean vectors of classes one and two respectively; 
whereas, tj and T. z are the covariance matrices of the same two classes. 




Ej + x 2 


and T denotes transpose 


(3) 


The average B-distance over all pairs of classes is given 


by: 


where 


®AVF *^ n ) 


m-1 


m(m-l ) i«l j«1+l 


l l B(1 j|C lt C 2i ..C n ) 


( 4 ) 


m « number of classes 

B(1J|Ci,C 2 ,...,C n ) * B-distance between classes 1 and j in the 

channels Cj ,C 2 ,... ,C n . 

was computed for all possible subsets of one, two, 
three and four spectral channels out of the available four channels. 

Each of these land use classes was divided into two 
independent sets: training fields and test fields. # Using training 

fields of residential areas, test fields of each of the above four 
classes were classified using the single-cell signature acquisition 
option of Image-100. This option creates a four-dimensional rectangular 
parallelepiped, each side of which corresponds to the signature limits 
of the training areas in each channel. The number of pixels classified as 
residential areas by the comouter inside the test fields of each of 
these four classes were determined. An identical analysis v/as repeated 
for each of the other three land use classes. Thus, a confusion matrix 
showing the total number of pixels (picture elements) of each class 
classified correctly as well as classified incorrectly into each of the 
other classes was obtained. 

This whole procedure was repeated for the multi cell 
signature acquisition option of the Image-100. In the multicell 
signature acquisition, the parallelepiped of spectral signature is 
subdivided into cells, each of unit volume* and the number of pixels in 
each of these unit cells is counted. These cell counts are, thus , measures 
of the probability distribution of the spectral cluster. By raising or 
lowering the threshold on the cell counts, one can vary the size of the 
four dimensional probability distribution of the spectral cluster by 
deleting or adding cells with counts greater than the variable 
threshold. In the interactive signature modification option, the user 
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performs training on the misclassified area, adding the errors of 
omission and subtracting the errors of commission until satisfied with 
the results. 


The same training fields of each class were used to 
classify the test fields using MLGC as well as the sample classifier 
based on B-di stance. B-di stance was computed between a test field and 
each of the four training classes, and the field was classified into the 
class for which the B-distance was minimum. 


Table 1 gives the values of in all possible 
combinations of one, two, three and four channels out of the four 
available channels. As one would expect, the values of increase 
with an Increase in the number of channels. In the subsets of one to 
three spectral channels, channel 4, channels 4 & 7 (one in the visible 
and one in the near infrared), and channels 4, 5 & 7 (two in the visible 
and one in the near infrared) are found to be the best choices. Table 1 
shows that in the subset of two channels, channels 4 and 5 (visible 
wavelength region) give higher probability of correct classification 
than channels 6 & 7 (near infrared wavelength region). The authors 
believe that each wavelength region — visible, near infrared, middle 
infrared and thermal infrared has independent information content. Thus, 
in the subset of two spectral channels, one channel in the visible and 
one channel in the near infrared wavelength region are found to be the 
best choice. Kumar (1 978) 1 0 has analysed aircraft collected MSS data in 
much detail, data quantity and depth in the subsets of one to twelve 
spectral channels to evaluate each spectral channel as well as possible 
combinations of wavelength regions for statistical separability of 
agricultural cover types. 


The errors of omission (while using training fields of 
residential areas, number of pixels of test fields known to be 
residential, >:ot classified as residential constitute the errors of 
omission, etc.) and the errors of commission (while using training 
fields of residential areas, number of pixels of classes other than 
residential but which are classified by the Image-100 as residential) 
were calculated and are shown in Table 2. Similarly, the errors of 
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omission and commission using the multicell signature acquisition (m*1, 
m*2 and m*3), for the earn training and test fields of each class were 
* calculated and are given in Table 2. The option m * 1 means that all the 
unit cells In the four dimensional spectral space* which had less than 
one pixel, were deleted from the spectral signature of the training 
fields for doing classification. Similarly, the option m ■ 2 means that 
all the unit cells in the four dimensional spectral space which had less 
than two pixels were deleted from the spectral signature of the training 
fields for doing classification etc. Table II shows that for the single- 
cell option, the errors of omission vary from 16.3% for the class 
commercial to 26.8% for the class residential , The errors of 
commission vary from 5.6% for the class commercial to 33.2% for the 
class unoccupied. It shows that classification accuracy for all the 
classes is rather poor except the class commercial where the 
percentage of errors are reasonably small (errors of omission = 16.3%, 
commission * 5.6%). Tills is because of small values of standard 
deviation of this class (and hence, less overlap with other classes) in 
each of the spectral channels, specially in the channels one (0.5 to 0.6 
. ym) and four (0.8 to 1.1pm). 

In general, an increase in the standard deviations of a 
class in the spectral channels tends to reduce the errors of omission 
and increase the errors of commission. It was found that taking into 
account both the errors of omission as well as those of commission, the 
classification accuracy generally decreases with an increase in the 
standard deviations. 

Table II shows, as expected, that the multicell option 
increases the errors of omission and decreases the errors of 
commission. Considering the errors of omission as well as errors of 
commission, multicell option for m * 1 considerably decreases the 
percentage of correct classification for each of the classes. This is 
because the number of pixels used for training of each class were 
relatively small for statistical purpose. Thus, the unit cells in the 
four dimensional spectral space were sparsely populated. Thus, there may 
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be raany cells which are actually representative of the class but do not 
have any pixel because the total number of pixels for training for each 
of the classes was rather small. For the multicell option, the errors of 
omission increase and the errors of commission decrease as we go from 
m ■ 1 to m * 2 to m * 3. Considering the errors of omission as well as 
the errors of commission, the percentage of correct classification 
decreases as we go from m = l to m * 2 to m * 3. 

Table II also shows that the interactive signature 
acquisition option does not improve the classification accuracy as 
compared to the ‘’single cell” option because the basic problem is the 
overlap between the classes in the four-dimensional spectral space. 

Table II shows the results of classification using pixel- 
by-pixel maximum likelihood gaussian classifier (MLGC) as well as a 
sample classifier. As pointed out earlier, the same training and test 
fields were used in this case as in the "single cell" or "multicell 
option" of the Image-100. 

Comparing these results to the single cell option, we 
find that the errors of commission are reduced significantly*, whereas, 
the errors of omission ore increased for some classes and ecreased for 
the others. For most classes, the larger were the standard deviations, 
the lower was the percentage of correct classification using the sample 
classifier. Comparing the sample classifier to the 'multicell option', we 
find that it gives much smaller errors of omission. However, MLGC gives 
higher errors of commission; whereas, the sample classifier gives greater 
errors of conmission for some classes and smaller for the others. On the 
whole, the sample classifier gives a percentage of correct 
classification slightly better than MLGC and much better than the single 
cell or multi-cell options of the Image-100.. 

In the future, computer compatible tapes being developed 
at INPE of Sao Jose dos Campos of other times will be analysed to 
investigate the effect of time on these results. 
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Table I - 

Values of 

8 ave in 

subset of one to four 

channels 

Channel 

P c 

Channels 

P c 

Channels 

P c 

4 

84.3 

4-5 

85,0 

4-5-6 

86.6 

5 

84.0 

4-6 

85.0 

4-5-7 

88.5 

6 

74.5 

4-7 

. 86.1 

4-6-7 

86.7 

7 

74.4 

5-6 

85.1 

5-6-7 

84.6 



5-7 

86.0 

4-S-6-7 

89.0 



6-7 

79.8 




Note: P denotes probability of correct classification estimated 
from the values of using curve of Swain' and King 9 . 


Table II - Percentage errors of omission and commission 
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